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Abstract. We prove the following nonholonomic version of the classical Moser 
theorem: given a bracket-generating distribution on a connected compact manifold 
(possibly with boundary) , two volume forms of equal total volume can be isotopcd 
by the flow of a vector field tangent to this distribution. We describe formal solu- 
tions of the corresponding nonholonomic mass transport problem and present the 
Hamiltonian framework for both the Otto calculus and its nonholonomic counter- 
part as infinite-dimensional Hamiltonian reductions on diffcomorphism groups. 

Finally, we define a nonholonomic analog of the Wasserstein (or, Kantorovich) 
metric on the space of densities and prove that the subriemannian heat equation 
defines a gradient flow on the nonholonomic Wasserstein space with the potential 
given by the Boltzmann relative entropy functional. 
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1. Introduction 



The classical Moser theorem establishes that the total volume is the only invariant 
for a volume form on a compact connected manifold with respect to the diffeomor- 
phism action. In this paper we prove a nonholonomic counterpart of this result and 
present its applications in the problems of nonholonomic optimal mass transport. 

The equivalence for the diffeomorphism action is often formulated in terms of 
"stability" of the corresponding object: the existence of a diffeomorphism relating 
the initial object with a deformed one means that the initial object is stable, as it 
differs from the deformed one merely by a coordinate change. Gray showed in [8] 
that contact structures on a compact manifold are stable. Moser [TJ] established 
stability for volume forms and symplectic structures. A leafwise counterpart of 
Moser's argument for foliations was presented by Ghys in [7], while stability of 
symplectic-contact pairs in transversal foliations was proved in [3]. In this paper 
we establish stability of volume forms in the presence of any bracket-generating 
distributions on connected compact manifolds: two volume forms of equal total 
volume on such a manifold can be isotoped by the flow of a vector field tangent to 
the distribution. We call this statement a nonholonomic Moser theorem. 

Recall that a distribution r on the manifold M is called bracket- generating, or 
completely nonholonomic, if local vector fields tangent to r and their iterated Lie 
brackets span the entire tangent bundle of the manifold M. Nonholonomic distribu- 
tions arise in various problems related to rolling or skating, wherever the "no-slip" 
condition is present. For instance, a ball rolling over a table defines a trajectory in a 
configuration space tangent to a nonholonomic distribution of admissible velocities. 
Note that such a ball can be rolled to any point of the table and stopped at any 
a priori prescribed position. The latter is a manifestation of the Chow-Rashevsky 
theorem (see e.g. [13]): For a bracket-generating distribution r on a connected man- 
ifold M any two points in M can be connected by a horizontal path (i.e. a path 
everywhere tangent to the distribution r) Q 

Note that for an integrable distribution there is a foliation to which it is tangent 
and a horizontal path always stays on the same leaf of this foliation. Furthermore, 
for an integrable distribution, the existence of an isotopy between volume forms 
requires an infinite number of conditions. On the contrary, the nonholonomic Moser 

1 The motivation for considering volume forms (or, densities) in a space with distribution can 
be related to problems with many tiny rolling balls: It is more convenient to consider the density 
of such balls, rather than look at them individually. 
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theorem shows that a non-integrable bracket-generating distribution imposes only 
one condition on total volume of the forms for the existence of the isotopy between 
them. 

Closely related to the nonholonomic Moser theorem is the existence of a nonholo- 
nomic Hodge decomposition, and the corresponding properties of the subriemannian 
Laplace operator, see Section 12.41 We also formulate the corresponding nonholo- 
nomic mass transport problem and describe its formal solutions as projections of 
horizontal geodesies on the diffeomorphism group for the L 2 -Carnot-Caratheodory 
metric. 

In order to give this description, we first present the Hamiltonian framework for 
what is now called the Otto calculus - the Riemannian submersion picture for the 
problems of optimal mass transport. It turns out that the submersion properties can 
be naturally understood as an infinite-dimensional Hamiltonian reduction on diffeo- 
morphism groups, and this admits a generalization to the nonholonomic setting. We 
define a nonholonomic analog of the Wasserstein metric on the space of densities. 
Finally, we extend Otto's result on the heat equation and prove that the subrieman- 
nian heat equation defines a gradient flow on the nonholonomic Wasserstein space 
with potential given by the Boltzmann relative entropy functional. 

2. Around Moser's theorem 

2.1. Classical and nonholonomic Moser theorems. The main goal of this sec- 
tion is to prove the following nonholonomic version of the classical Moser theorem. 
Consider a distribution r on a compact manifold M (without boundary unless oth- 
erwise stated). 

Theorem 2.1. Let t be a bracket-generating distribution, and Ho, Hi be two volume 
forms on M with the same total volume: j M Ho — Jm^ 1, Then there exists a diffeo- 
morphism <p of M which is the time- one-map of the flow <p t of a non- autonomous 
vector field V t tangent to the distribution r everywhere on M for every t £ [0,1], 
such that = /Jo- 
Note that the existence of the "nonholonomic isotopy" <f) t is guaranteed by the 
only condition on equality of total volumes for Ho and Hii just like in the classical 
case: 

Theorem 2.2. [13] Let M be a manifold without boundary, and Ho, Hi ore two 

volume forms on M with the same total volume: j M Ho = Jm A*i- Then there exists 
a diffeomorphism <fi of M , isotopic to the identity, such that (p*Hi = Ho- 

Remark 2.3. The classical Moser theorem has numerous variations and general- 
izations, some of which we would like to mention. 

a) Similarly one can show that not only the identity, but any diffeomorphism of 
M is isotopic to a diffeomorphism which pulls back Hi to Ho- 
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b) The Moser theorem also holds for a manifold M with boundary. In this case a 
diffeomorphism is a time-one-map for a (non-autonomous) vector field V on M, 
tangent to the boundary dM. 

c) Moser also proved in [TJ] a similar statement for a pair of symplectic forms on 
a manifold M: if two symplectic structures can be deformed to each other among 
symplectic structures in the same cohomology class on M, these deformation can 
be carried out by a flow of diffeomorphisms of M. 

Below we describe to which degree these variations extend to the nonholonomic 
case. 

2.2. The Moser theorem for a fibration. Apparently, the most straightforward 
generalization of the classical Moser theorem is its version "with parameters." In this 
case, volume forms on M smoothly depend on parameters and have the same total 
volume at each value of this parameter: f M Ho(s) = J M for all s. The theorem 
guarantees that the corresponding diffeomorphism exists and depends smoothly on 
this parameter s. 

The following theorem can be regarded as a modification of the parameter version: 

Theorem 2.4. Let n : N — > B be a fibration of an n- dimensional manifold N over 
a k-dimensional base manifold B. Suppose that /io, ^\ are two smooth volume forms 
on N. Assume that the pushforwards of these n-forms to B coincide, i.e. they give 
one and the same k-form on B: 7r*/i = 7T*^i. Then, there exists a diffeomorphism 
cf) of N which is the time-one-map of a (non- autonomous) vector field V tangent 
everywhere to the fibers of this fibration and such that <f>*/J,i = /io- 

Remark 2.5. Note that in this version the volume forms are given on the ambient 
manifold N, while in the parametric version of the Moser theorem we are given 
fberwise volume forms. There is also a similar version of this theorem for a foliation, 
cf. e.g. [7j. In either case, for the corresponding diffeomorphism to exist the volume 
forms have to satisfy infinitely many conditions (the equality of the total volumes 
as functions in the parameter s or as the push- forwards 7r*/io and 7r*/Ji). The case of 
a fibration (or a foliation) corresponds to an integrable distribution r, and presents 
the "opposite case" to a bracket-generating distribution. Unlike the case of an 
integrable distribution, the existence of the corresponding isotopy between volume 
forms in the bracket-generating case imposes only one condition, the equality of the 
total volumes of the two forms (regardless, e.g., of the distribution growth vector at 
different points of the manifold). 

2.3. Proofs. First, we recall a proof of the classical Moser theorem. To show how 
the proof changes in the nonholonomic case, we split it into several steps. 

Proof. 1) Connect the volume forms fio and ^\ by a "segment" \i t = /xo + i — A*o), 
t G [0,1]. We will be looking for a diffeomorphism g t sending \i t to g^t — A*o- By 
taking the t-derivative of this equation, we get the following "homological equation" 
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on the velocity V t of the flow g t : g^{Cv t pt + d t pt) = 0, where d t gt{x) = Vt(gt(x)). 
This is equivalent to 

£>VtlH = Po ~ Ml ) 

since d t p t = ~(po - Pi)- 

By rewriting p — pi = pt/it for an appropriate function p t , we reformulate the 
equation Cy t pt = Pt^t as the problem div Mt Vf = pt of looking for a vector field 
\4 with a prescribed divergence p%. Note that the total integral of the function p t 
(relative to the volume p t ) over M vanishes, which manifests the equality of total 
volumes for p t . 

2) We omit the index t for now and consider a Riemannian metric on M whose 
volume form is p. We are looking for a required field V with prescribed divergence 
among gradient vector fields V = Vu, which "transport the mass" in the fastest 
way. This leads us to the elliptic equation div M (Vtt) = p, i.e. Am = p, where the 
Laplacian A is defined by Au := div^Vu and depends on the Riemannian metric 
on M. 

3) The key part of the proof is the following 

Lemma 2.6. The Poisson equation Au = p on a compact Riemannian manifold 
M is solvable for any function p with zero mean: j M p p = ( with respect to the 
Riemannian volume form p). 

Proof of Lemma. Describe the space Coker A := (ImA) 1 ^, i.e. find the space of 
all functions h which are L 2 -orthogonal to the image Im A. By applying integration 
by parts twice, one has: 

= (h,Au) L 2 = -{Vh,Vu) L 2 = (Ah,u) L 2 

for all smooth functions u on M. Then such functions h must be harmonic, and 
hence they are constant functions on M: (ImA) 1 ^ = {const}. Since the image 
Im A is closed, it is the L 2 -orthogonal complement of the space of constant functions 
ImA = {const})- 1 !- 2 . The condition of orthogonality to constants is exactly the 
condition of zero mean for p: (const, p) L 2 = f M p p = 0. Thus the equation Au = p 
has a weak solution for p with zero mean, and the ellipticity of A implies that the 
solution is smooth for a smooth function p. □ 

4) Now, take Vt := Vttt and let g l v be the corresponding flow on M. Since M 
is compact and V t is smooth, the flow exists for all time t. The diffeomorphism 
:= gy, the time-one-map of the flow g v , gives the required map which pulls back 
the volume form pi to p : (jfp\ = po- □ 

Proof of Theorem 12.41 the M oser theorem for a fibration: 

We start by defining the new volume form on the fibres F using the pushforward 
/c-form z/q := ir*po on the base B and the volume n-form po on N. Namely, consider 
the pull-back /c-form 7r*z/o to N. Then there is a unique (n— /c)-form p$ on fibers such 
that Pq A 7rV = Po- Similarly we find /if. Due to the equality of the pushforwards 
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7r*/xo and 7r*//i, the total volumes of Pq and pf are fiberwise equal. Hence by 
the Moser theorem applied to the fibres, there is a smooth vector field tangent to 
the fibers, smoothly depending on a base point, and whose flow sends one of the 
(n — &)-forms, /if, to the other, /1q . This field is defined globally on N, and hence 
its time-one- map pulls back \i\ to p§. □ 



Now we turn to a nonholonomic distribution on a manifold. 

Proof of Theorem 12.11 the nonholonomic version of the Moser theorem. 

1) As before, we connect the forms by a segment pt, t G [0, 1], and we come to the 
same homological equation. The latter reduces to div^ = p with J p p = 0, but 
the equation now is for a vector field V tangent to the distribution r. 

2) Consider some Riemannian metric on M. Now we will be looking for the re- 
quired field V in the form V := P T Vw, where P T is a pointwise orthogonal projection 
of tangent vectors to the planes of our distribution r. 

We obtain the equation div M (P r Vw) = p. Rewrite this equation by introducing 
the sub-Laplacian A T u := div M (P T Vw) associated to the distribution r and the 
Riemannian metric on M. The equation on the potential u becomes A T u = p. 

3) An analog of Lemma [2.61 is now as follows. 

Proposition 2.7. a) The sub-Laplacian operator A T u := div M (P T Vw) is a self- 
adjoint hypoelliptic operator. Its image is closed in L 2 . 

b) The equation A T u = p on a compact Riemannian manifold M is solvable for 
any function p with zero mean: j M p p = 0. 

Proof of Proposition, a) The principal symbol 5 T of the operator A T is the sum of 
squares of vector fields forming a basis for the distribution r: 5 T = ^2Xf, where 
Xi form a horizontal orthonormal frame for r. This is exactly the Hormander 
condition of hypoellipticity [9] for the operator A T . The self-adjointness follows 
from the properties of projection and integration by parts. The closedness of the 
image in L 2 follows from the results of pjH [20] . 

b) We need to find the condition of weak solvability in L 2 for the equation A T u = p. 
Again, we are looking for all those functions h which are L 2 -orthogonal to the image 
of A T (or, which is the same, in the kernel of this operator): 

= (h,A T u) L 2 = (h,div^P T Vu)) L2 

for all smooth functions u on M. In particular, this should hold for u = h. Inte- 
grating by parts we come to 

= (h, div ft (P T V/i)) L 2 = -(Vh, P T Vh) L 2 = -(P T V/i, P T Vh) L 2 , 

where in the last equality we used the projection property (P T ) 2 = P r = (P r )*. 
Then P T V/i = on M, and hence the equation A T u = p is solvable for any function 
p J_l2 {h | P T Vh = 0}. We claim that all such functions h are constant on M. 
Indeed, the condition P T Vh = means that Cxh = for any horizontal field 
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X, i.e. a field tangent to the distribution r. But then h must be constant along 
any horizontal path, and due to the Chow-Rashevsky theorem it must be constant 
everywhere on M. Thus the functions p must be L 2 -orthogonal to all constants, 
and hence they have zero mean. This implies that the equation div fJi (P T Vu) = p is 
solvable for any L 2 function p with zero mean. For a smooth p the solution is also 
smooth due to hypoellipticity of the operator. □ 

4) Now consider the horizontal field Vt '■= P T Vu t . As before, the time-one-map 
of its flow exists for the smooth field Vt on the compact manifold M, and it gives 
the required diffeomorphism 0. □ 



2.4. The nonholonomic Hodge decomposition and sub-Laplacian. Accord- 
ing to the classical Helmholtz-Hodge decomposition, any vector field W on a Rie- 
mannian manifold M can be uniquely decomposed into the sum W = V + U, where 
V = V/ and div^U = 0. Proposition ^. 71 suggests the following nonholonomic Hodge 
decomposition of vector fields on a manifold with a bracket-generating distribution: 

Proposition 2.8. 1) For a bracket- generating distribution r on a Riemannian 
manifold M, any vector field W on M can be uniquely decomposed into the sum 
W = V + U , where the field V = P T Vf and it is tangent to the distribution r, 
while the field U is divergence- free: div M {7 = 0. Here P T is the pointwise orthogonal 
projection to r. 

2) Moreover, if the vector field W is tangent to the distribution r on M , then 
W = V + U , where V = P T V '/ 1 1 r as before, while the field U is divergence-free, 
tangent to r, and L 2 -orthogonal to V , see Figure\J] 

Proof. Let p := div M W be the divergence of W with respect to the Riemannian 
volume p. First, note that J M pp = 0. Indeed, J" (div M W) p = ^ M CwP = 0, since 
the volume of p is defined in a coordinate-free way, and does not change along the 
flow of the field W. 

Now, apply Proposition 12 . 71 to find a solution of the equation div (P r V/) = p. The 
field V := P T Vf is defined uniquely. Then the field U := W — V is divergence-free, 
which proves 1). 

For a field W \ \ r, we define V := -P T V/ in the same way. Note that V\\r as well. 
Then U := W — V is both tangent to r and divergence- free. Furthermore, 

(U, V) L2 = (U, P T Vf) L2 = (P T U, V/) L2 = (U, Vf) L 2 = (div M U, f) L2 = 0, 

where we used the properties of U established above: U \ \t and div M {7 = 0. 

□ 

Above we defined a sub-Laplacian A T u := div M (P r Vn) for a function u on a 
Riemannian manifold M with a distribution r. 
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Figure 1. A nonholonomic Hodge decomposition. 

Proposition 2.9. (cf. The sub-Laplacian A r depends only on a subriemannian 
metric on the distribution r and a volume form in the ambient manifold M . 

Proof. Note that the operator P T V on a function u is the horizontal gradient V r of 
u, i.e. the vector of the fastest growth of u among the directions in r. If one chooses 
a local orthonormal frame Xi, ...,Xk in r, then P T Vu = J^Lii^Xi u)Xi. Thus the 
definition of the horizontal gradient relies on the subriemannian metric only. 

The sub-Laplacian A T ip = div M (P T V^>) needs also the volume form p, in the 
ambient manifold to take the divergence with respect to this form. □ 

The corresponding nonholonomic heat equation d t u = A T u is also defined by the 
subriemannian metric and a volume form. 



2.5. The case with boundary. For a manifold M with non-empty boundary dM 
and two volume forms fio, p± of equal total volume, the classical Moser theorem 
establishes the existence of diffeomorphism <fi which is the time-one-map for the flow 
of a field Vt tangent to dM and such that = fio- 

The existence of the required gradient field Vt = Vw is guaranteed by the following 

Lemma 2.10. Let \i be a volume form on a Riemannian manifold M with boundary 
dM. The Poisson equation An = p with Neumann boundary condition -^u = on 
the boundary dM is solvable for any function p with zero mean: j M pp, = 0. 

Here ^ is the differentiation in the direction of outer normal n on the boundary. 

Proof of Lemma. Proceed in the same way as in Lemma 12.61 to find all functions 
L 2 -orthogonal to the image Im A. The first integration by parts gives: 
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0= / h(Au)fi=- f (Vh,Vu)n+ [ h(^-u) fi = - [ (Vh,Vu)n, 

J M J M JdM ^ n J M 

where in the last equality we used the Neumann boundary conditions. The second 
integration by parts gives: 

= / (Ah,u)n- [ (^-h)ufi 

JM JdM an 

This equation holds for all smooth functions u on M, so any such function h must be 
harmonic in M and satisfy the Neumann boundary condition ^-h — 0. Hence, these 
are constant functions on M: (ImA) 1 '- 2 = {const}. This gives the same description 
as in the no-boundary case: the image (Im A) with the Neumann condition consists 
of functions p with zero mean. □ 

Geometrically, the Neumann boundary condition means that there is no flux of 
density through the boundary dM: = |^ = n ■ Vu = n ■ V on dM. 

For distributions on manifolds with boundary, the solution of the Neumann prob- 
lem becomes a much more subtle issue, as the behavior of the distribution near the 
boundary affects the flux of horizontal fields across the boundary, and hence the 
solvability in this problem. However, there is a class of domains in length spaces for 
which the solvability of the Neumann problem was established. 

Let LS be a length space with the distance function d(x,y), defined as infimum 
of lengths of continuous curves joining x, y G LS. Consider domains in this space 
with the property that sufficiently close points in those domains can be joined by 
a not very long path which does not get too close to the domain boundary. The 
formal definition is as follows. 

Definition 2.11. An open set Q C LS is called an (e, 5)-domain if there exist S > 
and < e < 1 such that for any pair of points p,q G £1 with d(p, q) < S there is a 
continuous rectifiable curve 7 : [0,T] — > Q starting at p and ending at q such that 
the length of the curve 7 satisfies 

< -d(p,q) 

and 

min{c/(p, z),d(q, z)} < -d(z,dQ) 

for all points z on the curve 7. 

A large source of (e, <5)-domains is given by some classes of open sets in Carnot 
groups, where the Carnot group itself is regarded as a length space with the Carnot- 
Caratheodory distance, defined via the lengths of admissible (i.e. horizontal) paths, 
see e.g. [15J. There is a natural notion of diameter (or, radius) for domains in length 
spaces. 
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Theorem 2.12. Let r be a bracket-generating distribution on a subriemannian man- 
ifold M with smooth boundary DM , and /io, fii be two volume forms on M with the 
same total volume: f M Po = f M ^i- Suppose that the interior of M is an (e, 5)- 
domain of positive diameter. 

Then there exists a diffeomorphism of M which is the time-one-map of the flow 
4>t of a non- autonomous vector field V t tangent to the distribution r everywhere on 
M and to the boundary dM for every t G [0, 1], such that 0*//i = /io- 

The proof immediately follows from the result on solvability of the correspond- 
ing Neumann problem A T u = p with n ■ (P T Vw)|aM = (or, which is the same, 
g,p" s \qm = 0) for such domains, established in [TBI US] (cf. Theorem 1.5 in [T5]). 



3. Distributions on diffeomorphism groups 

3.1. A flbration on the group of diffeomorphisms. Let T> be the group of all 
(orientation-preserving) diffeomorphisms of a manifold M . Its Lie algebra X consists 
of all smooth vector fields on M. The tangent space to the diffeomorphism group 
at any point G T> is given by the right translation of the Lie algebra X from the 
identity id eD to 0: 

T^V = {X o | X G X} . 

Fix a volume form fi of total volume 1 on the manifold M. Denote by T>^ the 
subgroup of volume-preserving diffeomorphisms, i.e. the diffeomorphisms preserving 
the volume form /i. The corresponding Lie algebra X M is the space of all vector fields 
on the manifold M which are divergence-free with respect to the volume form /i. 

Let W be the set of all smooth normalized volume forms in M, which is called the 
(smooth) Wasserstein space. Consider the projection map tt v : T> — > W defined by 
the push forward of the fixed volume form fi by the diffeomorphism 0, i.e. 7T' D (0) = 
(/>*/i. The projection ir v : T> — > W defines a natural structure of a principal bundle on 
T> whose structure group is the subgroup of volume-preserving diffeomorphisms 
of M and fibers F are right cosets for this subgroup in T>. Two diffeomorphisms 
<fi and lie in the same fiber if they differ by a composition (on the right) with a 
volume-preserving diffeomorphism: = o s, s G T>^. 

On the group T> we define two vector bundles Ver and Hor whose spaces at a 
diffeomorphism G T> consist of right translated divergence-free fields 

Vert = {X o | div^X = 0} 

and gradient fields 

Hor^ := {V/o0 | feC°°(M)}, 
respectively. Note that the bundle Ver is defined by the fixed volume form /i, while 
Hor requires a Riemannian metricl 

2 The metric on M does not need to have the volume form fi. In the general case, X M consists 
of vector fields divergence-free with respect to /i, while the gradients arc considered for the chosen 
metric on M. 
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Proposition 3.1. The bundle Ver of translated divergence-free fields is the bundle 
of vertical spaces T^F for the fibration ir v : T> — > W. The bundle Hor overT) defines 
a horizontal distribution for this fibration n v . 

Proof. Let <fi t be a curve in a fibre of 7r v : D — > W emanating from the point <fio = <fi. 
Then cfi t = <fio ° St, where sq = id and St are volume-preserving diffeomorphisms for 
each t. Let X t be a family of divergence-free vector fields, such that d t s t = X t o s t . 

Then the vector tangent to the curve (fit = 4>o ° s t is given by 4 (6q o s t ) = 

m t=o 

((fi *X ) o <fi . Since X is divergence-free with respect to /i, <fio*X is divergence-free 
with respect to Hence, any vector tangent to the diffeomorphism group at <fi is 
given by X o 0, where X is a divergence-free field with respect to the form (fi^/j,. 

By the Hodge decomposition of vector fields, we have the direct sum TV = 
Hor © Ver. □ 

Remark 3.2. The classical Moser theorem 12.21 can be thought of as the existence 
of path-lifting property for the principal bundle n v : T> — > W: any deformation of 
volume forms can be traced by the corresponding flow, i.e. a path on the diffeo- 
morphism group, projected to the deformation of forms. Its proof shows that this 
path lifting property holds and has the uniqueness property in the presence of the 
horizontal distribution defined above by using the Hodge decomposition. Namely, 
given any path fi t starting at /i in the smooth Wasserstein space W and a point 
(fi in the fibre (7r I3 )~ 1 /i , there exists a unique path <fi t in the diffeomorphism group 
which is tangent to the horizontal bundle Hor, starts at <fi , and projects to /i t , see 
Figure [2J 



3.2. A nonholonomic distribution on the diffeomorphism group. Let r be 

a bracket-generating distribution on the manifold M. Consider the right-invariant 
distribution T on the diffeomorphism group T> defined at the identity id G T> of 
the group by the subspace in X of all those vector fields which are tangent to the 
distribution r everywhere on M: 

r^ = {Vocfi\ V(x) G r x for all x G M}. 



Proposition 3.3. The infinite- dimensional distribution T is a non-integrable dis- 
tribution in T>. Horizontal paths in this distribution are flows of non- autonomous 
vector fields tangent to the distribution r on manifold M . 

Proof. To see that the distribution T is non-integrable we consider two horizontal 
vector fields V and W on M and the corresponding right-invariant vector fields 
V and W on T>. Then their bracket at the identity of the group is (minus) their 
commutator as vector fields V and W in M . This commutator does not belong to 
the plane %d since the distribution r is non-integrable, and at least somewhere on 
M the commutator of horizontal fields V and W is not horizontal. 
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Figure 2. The Moser theorem in both the classical and nonholo- 
nomic settings is a path-lifting property in the diffeomorphism group. 



The second statement immediately follows from the definition of T. 



□ 



Remark 3.4. Consider now the projection map ir . V — ► W in the presence of 
the distribution T on T>. The path lifting property in this case is a restatement of 
the nonholonomic Moser theorem. Namely, for a curve {fit | A*o = A 4 } m the space 
W of smooth densities Theorem 12. II proves that there is a curve | g° = id} in T>, 
everywhere tangent to the distribution T and projecting to {fit}- ^(g 1 ) = fi t . 

Recall that in the classical case the corresponding path lifting becomes unique once 
we fix the gradient horizontal bundle Hor^ C T^D for any diffeomorphism <f> G T>. 
Similarly, in the nonholonomic case we consider the spaces of gradient projections 
instead of the gradient spaces: HorJ d := {P T V / | / G C°°(M)}, where P T stands for 
the orthogonal projection onto the distribution r in a given Riemannian metric on 
M. The right-translated gradient projections Hor^ := {(P T Vf) o <p \ f G C°°(M)} 
define a horizontal bundle for the principal bundle T> — > W by nonholonomic Hodge 
decomposition. (Note also that in both classical and nonholonomic cases, the ob- 
tained horizontal distributions on T> are nonintegrable, cf. [18]. Indeed, the Lie 
bracket of two gradient fields is not necessarily a gradient field, and similarly for 
gradient projections. Hence there are no horizontal sections of the bundle T> — > W, 
tangent to these horizontal gradient distributions.) 
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As we will see in Sections H] and El both gradient fields {V/} in the classical case 
and gradient projections {P T V/} in the nonholonomic case allow one to move the 
densities in the "fastest way", and are important in transport problems of finding 
optimal ( "shortest" ) path between densities. 



3.3. Accessibility of diffeomorphisms and symplectic structures. Presum- 
ably, even a stronger statement holds: 

Conjecture 3.5. Every diffeomorphism in the diffeomorphism group T> can be ac- 
cessed by a horizontal path tangent to the distribution T. 

This conjecture can be thought of as an analog of the Chow-Rashevsky theorem in 
the infinite-dimensional setting of the group of diffeomorphisms, provided that the 
distribution T is bracket-generating on D. Note, however, that the Chow-Rashevsky 
theorem is unknown in the general setting of an infinite-dimensional manifold, while 
there are only "approximate" analogs of it, e.g. on a Hilbert manifold. 

A proof of this conjecture on accessibility of all diffeomorphisms by flows of vec- 
tor fields tangent to a nonholonomic distribution would imply the nonholonomic 
Moser theorem 12.11 on volume forms. Moreover, it would also imply the following 
nonholonomic version of the Moser theorem on symplectic structures from [14] . 

Conjecture 3.6. Suppose that on a manifold M two symplectic structures Uq anduj\ 
from the same cohomology class can be connected by a path of symplectic structures 
in the same class. Then for a bracket- generating distribution r on M there exists a 
diffeomorphism of M which is the time-one-map of a non- autonomous vector field 
Vt tangent to the distribution r everywhere on M and for every t G [0, 1], such that 

0*^1 = UJ . 

This conjecture follows from the one above since one would consider the diffeo- 
morphism from the classical Moser theorem, and realize it by the horizontal path 
(tangent to the distribution T) on the diffeomorphism group, which exists if Con- 
jecture 13.51 holds. 



4. The Riemannian geometry of diffeomorphism groups and mass 

transport 

The differential geometry of diffeomorphism groups is closely related to the theory 
of optimal mass transport, and in particular, to the problem of moving one density 
to another while minimizing certain cost on a Riemannian manifold. In this section, 
we review the corresponding metric properties of the diffeomorphism group and the 
space of volume forms. 
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4.1. Optimal mass transport. Let M be a compact Riemannian manifold with- 
out boundary (or, more generally, a complete metric space) with a distance function 
d. Let [i and v be two Borel probability measures on the manifold M which are abso- 
lutely continuous with respect to the Lebesgue measure. Consider the following opti- 
mal mass transport problem: Find a Borel map <j) : M — > M that pushes the measure 
\i forward to v and attains the infimum of the L 2 -cost functional J M d 2 (x, (f)(x))fi 
among all such maps. 

The set of all Borel probability measures is called the Wasserstein space. The 
minimal cost of transport defines a metric d on this space: 



This mass transport problem admits a unique solution (defined up to measure 
zero sets), called an optimal map (see [5] for M = R n and [TJ] for any compact 
connected Riemannian manifold M without boundary). Furthermore, there exists 
a 1-parameter family of Borel maps <f> t starting at the identity map <fro = id, ending 
at the optimal map <f>\ = <f> and such that <fit is the optimal map pushing \i forward 
to v t := 4>t*H for any t G (0, 1). The corresponding 1-parameter family of measures 
v% describes a geodesic in the Wasserstein space of measures with respect to the 
distance function d and is called the displacement interpolation between \x and u, 
see [22J for details. (More generally, in mass transport problems one can replace d 2 
in the above formula by a cost function c : M x M — > R, while we mostly focus on 
the case c = d 2 /2 and its subriemannian analog.) 

In what follows, we consider a smooth version of the Wasserstein space, cf. Section 
13.11 Recall that the smooth Wasserstein space VV consists of smooth volume forms 
with the total integral equal to 1. One can consider an infinite-dimensional manifold 
structure on the smooth Wasserstein space, a (weak) Riemannian metric ( , ) w , 
corresponding to the distance function d, and geodesies on this space. Similar to 
the finite-dimensional case, geodesies on the smooth Wasserstein space W can be 
formally defined as projections of trajectories of the Hamiltonian vector field with 
the "kinetic energy" Hamiltonian in the tangent bundle TW. 



4.2. The Otto calculus. For a Riemannian manifold M both spaces T> and W 
can be equipped with (weak) Riemannian structures, i.e. can be formally re- 
garded as infinite-dimensional Riemannian manifolds, cf. (One can consider 
if s -diffeomorphisms and if s-1 -forms of Sobolev class s > n/2 + 1. Both sets can 
be considered as smooth Hilbert manifolds. However, this is not applicable in the 
subriemannian case, discussed later, hence we confine to the C°° setting applicable 
in the both cases.) 

From now on we fix a Riemannian metric (, ) M on the manifold M, whose Rie- 
mannian volume is the form \i. On the diffeomorphism group we define a Riemannian 



(4.1) 
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metric (, ) v whose value at a point <fi £ T> is given by 

(4.2) (X 1 o0,X 2 o0) I, := f (X 1 o ( f>(x),X 2 o ( f>(x))f {x) fi. 

J M 

The action along a curve (or, "energy" of a curve) {<pt \ t £ [0,1]} C Pin this metric 
is defined in the following straightforward way: 

E({(f>t})= [ dt f (d t <f> t ,d t( f> t ) M l i. 

J JM 

If M is flat, T> is locally isometric to the (pre-)Hilbert L 2 -space of (smooth) vector- 
functions 0, see e.g. (21]. The following proposition is well-known. 

Proposition 4.1. Let <p t be a geodesic on the diffeomorphism group T> with respect 
to the above Riemannian metric (,) T> , and Vt be the (time- dependent) velocity field 
of the corresponding flow: dt<pt = Vt o <fi t . Then the velocity Vt satisfies the inviscid 
Burgers equation on M: 

d t Vt + V Vt V t = 0, 

where V^Vj stands for the covariant derivative of the field Vt on M along itself. 

Proof. In the flat case the geodesic equation is <9 t 2 t = 0: this is the Euler-Lagrange 
equation for the action functional E. Differentiate d t <f)t = Vt ° fit with respect to 
time t and use this geodesic equation to obtain 

(4.3) d t V t o <j) t + V Vt d t <t>t = 0. 
After another substitution d t <pt = V t o <p t , the later becomes 

(dtV t + V Vt V t )°<Pt = 0, 

which is equivalent to the Burgers equation. 

The non-flat case involves differentiation in the Levi-Civita connection on M and 
leads to the same Burgers equation, see details in j6l [11]. □ 

Remark 4.2. Smooth solutions of the Burgers equation correspond to non-interacting 
particles on the manifold M flying along those geodesies on M which are defined by 
the initial velocities Vq{x). The Burgers flows have the form (f>t(x) = exp A/ (tVo(x)), 
where exp M : TM — > M is the Riemannian exponential map on M. 

Proposition 4.3. [18] The bundle projection tt v : T> — > W is a Riemannian submer- 
sion of the metric ( , ) v on the diffeomorphism group T> to the Riemannian metric 
( , ) w on the smooth Wasserstein space W for the L 2 -cost. The horizontal (i.e. 
normal to fibers) spaces in the bundle T> — > W are right-translated gradient fields. 

Recall that for two Riemannian manifolds Q and B, a Riemannian submersion 
7i : Q — » B is a mapping onto B which has maximal rank and preserves lengths of 
horizontal tangent vectors to Q, see e.g. [17]. For a bundle Q — > B, this means that 
there is a distribution of horizontal spaces on Q, orthogonal to the fibers, which is 
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projected isometrically to the tangent spaces to B. One of the main properties of a 
Riemannian submersion gives the following feature of geodesies: 

Corollary 4.4. Any geodesic, initially tangent to a horizontal space on the full 
diffeomorphism group T>, always remains horizontal, i.e. tangent to the horizontal 
distribution. There is a one-to-one correspondence between geodesies on the base 
W starting at the measure \i and horizontal geodesies in V starting at the identity 
diffeomorphism id. 

Remark 4.5. In the PDE terms, the horizontality of a geodesic means that a so- 
lution of the Burgers equation with a potential initial condition remains potential 
forever. This also follows from the Hamiltonian formalism and the moment map 
geometry discussed in the next section. Since horizontal geodesies in the group T> 
correspond to geodesies on the density space W, potential solutions of the Burgers 
equation (corresponding to horizontal geodesies) move the densities in the fastest 
way. The corresponding time-one-maps for Burgers potential solutions provide op- 
timal maps for moving the density fi to any other density u, see [5j IT2] . 

The Burgers potential solutions have the form <pt{x) = exp A/ (— tVf(x)) as long 
as the right-hand-side is smooth. The time-one-map 0i for the flow <p t provides 
an optimal map between probability measures if the function / is a {d 2 / 2)- concave 
function. The notion of c-concavity for a cost function c on M is defined as follows. 
For a function / its c-transform is f c (y) = mi x€M (c(x, y) — f(x)) and the function / 
is said to be c-concave if f cc = f. Here, we consider the case c = d 2 /2. The family 
of maps <pt defines the displacement interpolation mentioned in Section 4.1. 

Let 9 and v be volume forms with the same total volume and let g and h be 
functions on the manifold M defined by 9 = gvo\ and v = hvol, where vol be the 
Riemannian volume form. Then a diffeomorphism <fi moving one density to the other 
(0*0 = v) satisfies h(<fr(x)) det(D<f)(x)) = g(x), where D<f> is the Jacobi matrix of the 
diffeomorphism 0. In the flat case the optimal map is gradient, = Vf, and the 
corresponding convex potential / satisfies the Monge- Ampere equation 

det(Hess/(x))) = 9 ^ , 

Mv/(x))' 

since D(Vf) = Hess /. In the non-flat case, the optimal map is 0(x) = exp M (— V/(x)) 
for a (d 2 /2)-concave potential /, and the equation is Monge- Ampere-like, see [T21I22] 
for details. Below we describe the corresponding nonholonomic analogs of these ob- 
jects. 



5. The Hamiltonian mechanics on diffeomorphism groups 

In this section we present a Hamiltonian framework for the Otto calculus and, 
in particular, give a symplectic proof of Proposition 14.31 and Corollary 14.41 on the 
submersion properties along with their generalizations. 
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5.1. Averaged Hamiltonians. We fix a Riemannian metric (, ) A/ on the manifold 
M and consider the corresponding Riemannian metric (,) v on the diffeomorphism 
group V. This defines a map (X o 0) i— > [X o <j) , from the tangent bundle TV 
to the cotangent bundle T*T>. By using this map, one can pull back the canonical 
symplectic form uj t v from the cotangent bundle T*V to the tangent bundle TV, and 
regard the latter as a manifold equipped with the symplectic form cj ti, J1 Similarly, a 
symplectic structure u™ can be defined on the tangent bundle TM by pulling back 
the canonical symplectic form on the cotangent bundle T*M via the Riemannian 
metric (,) A/ . The two symplectic forms are related as follows. A tangent vector 
V in the tangent space Tx <j>TV at the point X o <p e TV is a map from M to 
T(TM) = T 2 M such that tt t2m o V = X o fa where tt t2m : T(TM) -> TM is the 
tangent bundle projection. Let V\ and V 2 be two tangent vectors in T Xo <f>TV at the 
point X o 0, then the symplectic forms are related in the following way: 



where u™ is understood as the pairing on T(TM) = T 2 M. 

Definition 5.1. Let H be a Hamiltonian function on the tangent bundle TM 
of the manifold M. The averaged Hamiltonian function is the function H v on 
the tangent bundle TT> of the diffeomorphism group T> obtained by averaging the 
corresponding Hamiltonian H M over M in the following way: its value at a point 

X o cj) e T^V is 



for a vector field X G X and a diffeomorphism <p EV. 

Consider the Hamiltonian flows for these Hamiltonian functions H M and H v 
on the tangent bundles TM and TV, respectively, with respect to the standard 
symplectic structures on the bundles. The following theorem can be viewed as a 
generalization of Propositions 14.11 and 14.31 

Theorem 5.2. Each Hamiltonian trajectory for the averaged Hamiltonian function 
H v on TV describes a flow on the tangent bundle TM, in which every tangent 
vector to M moves along its own H M -Hamiltonian trajectory in TM. 

Example 5.3. For the Hamiltonian K M {p,q) = \{p,p) M given by the "kinetic 
energy" for the metric on M, the above theorem implies that any geodesic on V 
is a family of diffeomorphisms of M, in which each particle moves along its own 
geodesic on M with constant velocity, i.e. its velocity field is a solution to the 
Burgers equation, cf. Remark 14.21 




(5.4) 




3 The consideration of the tangent bundle TV (instead of T*T>) as a symplectic manifold allows 
one to avoid dealing with duals of infinite-dimensional spaces here. 
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Below we discuss this theorem and its geometric meaning in detail. In particular, 
in the above form, the statement is also applicable to the case of nonholonomic 
distributions (i.e. subriemannian, or Carnot-Caratheodory spaces) discussed in the 
next section. 

5.2. Riemannian submersion and symplectic quotients. We start with a Hamil- 
tonian proof of Proposition 14.31 on the Riemannian submersion T> — > W of diffeo- 
morphisms onto densities. Recall the following general construction in symplectic 
geometry. Let ix : Q — > B be a principal bundle with the structure group G. 

Lemma 5.4. (see e.g. The symplectic reduction of the cotangent bundle T*Q 
over the G-action gives the cotangent bundle T*B = T*Q//G. 

Proof. The moment map J : T*Q — > g* associated with this action takes T*Q to 
the dual of the Lie algebra g = Lie(G). For the G-action on T*Q the moment map 
J is the projection of any cotangent space T*Q to cotangent space T*F « g* for 
the fiber F through a point a G Q. The preimage J~ 1 (0) of the zero value is the 
subbundle of T*Q consisting of covectors vanishing on fibers. Such covectors are 
naturally identified with covectors on the base B. Thus factoring out the G-action, 
which moves the point a over the fiber F, we obtain the bundle T*B. □ 

Suppose also that Q is equipped with a G-invariant Riemannian metric (, )®. 

Lemma 5.5. The Riemannian submersion of (Q, (,)®) to the base B with the in- 
duced metric (,) B is the result of the symplectic reduction. 

Proof. Indeed, the metric (,)^ gives a natural identification T*Q m TQ of the 
tangent and cotangent bundles for Q, and the "projected metric" is equivalent to a 
similar identification for the base manifold B. 

In the presence of metric in Q, the preimage J _1 (0) is identified with all vectors 
in TQ orthogonal to fibers, that is J _1 (0) is the horizontal subbundle in TQ. Hence, 
the symplectic quotient J _1 (0) / G can be identified with the tangent bundle TB. □ 

Proof of Proposition 14.31 Now we apply this "dictionary" to the diffeomorphism 
group T> and the Wasserstein space W. Consider the projection map ir v : T> — ► W 
as a principal bundle with the structure group T>^ of volume-preserving diffeomor- 
phisms of M. Recall that the vertical space of this principal bundle at a point 
4> ET> consists of right-translations by the diffeomorphism <fi of vector fields which are 
divergence-free with respect to the volume form </>*/x: Ver^ = {Xo(fi \ div^^X = 0} , 
and the horizontal space is given by translated gradient fields: Hor^ = {V/o0 | / e 
G°°(M)}. 

For each volume-preserving diffeomorphism ip e X> M , the X> M -action of ip by 
right translations on the diffeomorphism group is given by 

Rip(<f>) = <f)ot\}. 
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The induced action TR^ : TV — > TV on the tangent spaces of the diffeomorphism 
group is given by 



One can see that for volume-preserving diffeomorphisms ip this action preserves 
the Riemannian metric (14.21) on the diffeomorphism group V (it is the change of 
variable formula), while for a general diffeomorphism one has an extra factor Dip, 



Remark 5.6. The explicit formula of the moment map J : TQ — > X* for the group 
of volume-preserving diffeomorphisms G = V^ acting on Q = V is 



J M 

where Y e X M is any vector field on M divergence-free with respect to the volume 
form /i, X G X, and (p G V. 

5.3. Hamiltonian flows on the diffeomorphism groups. Let H® : TQ — > R 

be a Hamiltonian function invariant under the G-action on the cotangent bundle 
of the total space Q. The restriction of the function H® to the horizontal bundle 
J _1 (0) C TQ is also G-invariant, and hence descends to a function H B : TB — > M. 
on the symplectic quotient, the tangent bundle of the base B. Symplectic quotients 
admit the following reduction of Hamiltonian dynamics: 

Proposition 5.7. [2] The Hamiltonian flow of the function preserves the preim- 
age J _1 (0), i.e. trajectories with horizontal initial conditions stay horizontal. Fur- 
thermore, the Hamiltonian flow of the function H® on the tangent bundle TQ of the 
total space Q descends to the Hamiltonian flow of the function H B on the tangent 
bundle TB of the base. 

Now we are going to apply this scheme to the bundle V — > W. For a fixed 
Hamiltonian function H M on the tangent bundle TM to the manifold M, consider 
the corresponding averaged Hamiltonian function H v on TV, given by the formula 
f l5T4j) : H V (X o 0) : = f M H M (X o (f)(x))fi. The latter Hamiltonian is ^-invariant 
(as also follows from the change of variable formula) and it will play the role of the 
function H® '. Thus the flow for the averaged Hamiltonian H v descends to the flow 
of a certain Hamiltonian H w on TW. 

Describe explicitly the corresponding flow on the tangent bundles of V and W. 
Let : TM — > TM be the Hamiltonian flow of the Hamiltonian H M on the 

tangent bundle of the manifold M and : TV — > TV denotes the flow for the 
Hamiltonian function H v on the tangent bundle of the diffeomorphism group. 



TR^Xocj)) = (Xo<f>) orjj. 



the Jacobian of ip, in the integral. 



□ 
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Theorem 5.8. (= 15.21 ) The Hamiltonian flows of the Hamiltonians H v and H M 
are related by 

^r(Xo0)(x) = vl/f M (X(0(x))), 

where, on the right-hand- side, the flow on TM transports the shifted field 

X((p(x)), while, on the left- hand- side, X o is regarded as a tangent vector to T> at 
the point 0. 

Proof. Prove this infinitesimally (cf. j6]). Let X h t> and X h m be the Hamiltonian 
vector fields corresponding to the Hamiltonians H v and H M respectively. We claim 
that X h t> (X o (p) = X h m o X o ip. Indeed, by the definition of Hamiltonian fields, 
we have 

u T7 >(X h moXo<I>,Y)= [ u™{X h m(X{<I>{x))),Y(x))h= [ dH% Wx)) (Y(x))n(x) 

Jm Jm 

for any Y G T^D. By interchanging the integration and exterior differentiation, the 
latter expression becomes dH^ Q ,(Y) and the result follows. □ 

Remark 5.9. This theorem has a simple geometric meaning for the "kinetic energy" 
Hamiltonian function K M (v) := \{v,v) M on the tangent bundle TM. One of the 
possible definitions of geodesies in M is that they are projections to M of trajectories 
of the Hamiltonian flow on TM, whose Hamiltonian function is the kinetic energy. 
In other words, the Riemannian exponential map exp M on the manifold M is the 
projection of the Hamiltonian flow tyf on TM. Similarly, the Riemannian expo- 
nential exp v of the diffeomorphism group T> is the projection of the Hamiltonian 
flow for the Hamiltonian K V (X o<j>):=\ J M (X o 0, X o 0) M /i on TV. 

Recall that the geodesies on the diffeomorphism group (described by the Burgers 
equation, see Proposition 14.11) starting at the identity with the initial velocity V G 
TidD are the flows which move each particle x on the manifold M along the geodesic 
with the direction V(x). Such a geodesic is well defined on the diffeomorphism group 
T> as long as the particles do not collide. The corresponding Hamiltonian flow on the 
tangent bundle TT> of the diffeomorphism group describes how the corresponding 
velocities of these particles vary (cf. Example 15.31) . 

For a more general Hamiltonian H M on the tangent bundle TM, each particle 
x G M with an initial velocity V(x) will be moving along the corresponding char- 
acteristic, which is the projection to M of the corresponding trajectory (V(x)) 
in the tangent bundle TM. 

Now we would like to describe more explicitly horizontal geodesies and character- 
istics on the diffeomorphism group T>. Recall that denotes the Hamiltonian flow 
of the averaged Hamiltonian H v on the tangent bundle TT> of the diffeomorphism 
group T>. If this Hamiltonian flow is gradient at the initial moment, it always stays 
gradient, as implied by Corollary 14.41 Furthermore, the corresponding potential can 
be described as follows. 
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Corollary 5.10. Let f be a function on the manifold M. Then the Hamiltonian 
flow for H v with the initial condition V/ o £ T^D has the form V/< o <p t , where 
<f>t £ P is a family of diffeomorphisms and f t is the family of functions on M starting 
a t fo — f an d satisfying the Hamilton- Jacobi equation 

(5.5) d t f t + H M (Vf t (x)) = 0. 

Proof. This follows from the method of characteristics, which gives the following 
way of finding ft, the solution to the Hamilton- Jacobi equation (15.51) . Consider the 
tangent vector Vf(x) for each point x £ M. Denote by tyf : TM — > TM the 
Hamiltonian flow for the Hamiltonian H : TM — > R and consider its trajectory 
1 1 — > (Vf(x)) starting at the tangent vector Vf(x). Then project this trajectory 
to M using the tangent bundle projection n™ : TM — > M to obtain a curve in M. 
It is given by the formula t i-> 7r™(*f (V/(x))). As x varies over the manifold 
M, this defines a flow <p t '■= ^? ° ^ f on M. (Note that this procedure 

defines a flow for small time t, while for larger times the map 4>t may cease to be a 
diffeomorphism, i.e. shock waves can appear.) The corresponding time-dependent 
vector field is gradient and defines the family V/i, the gradient of the solution to 
the Hamilton- Jacobi equation above, see Figure [31 □ 




Figure 3. Hamiltonian flow of the Hamiltonian H and its projec- 
tion: The curve <fit{x) is the projection of the curve tyj* (Vf(x)) to 
the manifold M. 

Remark 5.11. The above corollary manifests that the Hamilton- Jacobi equation 
05.5p can be solved using the method of characteristics due to the built-in symmetry 
group of all volume preserving diffeomorphisms. 
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5.4. Hamiltonian flows on the Wasserstein space. What is the corresponding 
flow on the tangent bundle TW of the Wasserstein space, induced by the Hamilton- 
ian flow on TV for the diffeomorphism group V after the projection n v : V — > W? 
Fix a Hamiltonian H M on the tangent bundle TM which defines the averaged Hamil- 
tonian function H v on the tangent bundle TV, see Equation (15. 4p . Describe explic- 
itly the induced Hamiltonian H w on the tangent bundle TW. 

Let (z/, 77) be a tangent vector at a density v on M, regarded as a point of the 
Wasserstein space W. The normalization of densities ( J v — 1 for all v £ W) gives 
the constraint for tangent vectors: J M V = 0. Let / : M -> 1 be a function that 
satisfies (— &yv u V f)v = 77. (Given (^,77), such a function is defined uniquely up to 
an additive constant.) Then the induced Hamiltonian on the tangent bundle TW 
of the base W is given by 



Jm 

since V/ is a vector of the horizontal distribution in TV. 

Now, the flow tyf of the corresponding Hamiltonian field on TW can be found 
explicitly by employing Proposition [5/7J Consider the flow <fi t : = 
defined on M for small t in Corollary 15.101 

Theorem 5.12. The Hamiltonian flow of the Hamiltonian function H w on 

the tangent bundle TW of the Wasserstein space W is 



where C is the Lie derivative, the family of functions ft satisfies the Hamilton- Jacobi 
equation Ii5. 5\) for the Hamiltonian function H on the tangent bundle TM, and the 
family u t = (0 t )*^ is the push forward of the volume form v by the map 4> t defined 
above. 

Proof. The function H V (X o 0) = J H M (X((j)(x)))fi(x) on the tangent bundle TV 
of the diffeomorphism group induces the Hamiltonian H w on TW. By virtue of the 
Hamiltonian reduction, Hamiltonian trajectories of H v contained in the horizontal 
bundle Hor = {Vf o | / £ C°°(M)} descend to Hamiltonian trajectories of H w . 
Then the Hamiltonian flow ^> hT> of the Hamiltonian H v is given by V H (X O 0) = 
ty H o X o 0, due to Theorem 15.81 By restricting this to the horizontal bundle Hor 
we have 



The flow ^ H is described in Corollary l5.10l and has the form ^ H (V/o0) = V/t°0t, 
where f t and (fit are defined as required. 

On the other hand, recall that the projection ir v : V — > W is defined by 7r v ((fi) = 
(fi^fi. The differential Dir v of this map tt v is 



(5.6) 




(5.7) 



q H (V/o0) = q H o V/o0. 



Dn(X o (fi) ■= (^/j, -£ x ((p*fjL)) . 



The application of this relation to (15. 7p gives the result. 



□ 
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Remark 5.13. The time-one-map for the above density flow v t in the Wasserstein 
space W formally describes optimal transport maps for the Hamiltonian H M . In 
particular, it recovers the optimal map recently obtained in jl]. One considers the 
optimal transport problem for the functional 

inf{ / c(x,(p(x))fi | = z/} 
Jm 

with the cost function c defined by 

c{x,y) = inf / L(7,7)dt , 

{7 paths between x and y\ Jq 

where the infimum is taken over paths 7 joining x and y and the Lagrangian 
L : TM — > R satisfies certain regularity and convexity assumptions, see jl]. The 
corresponding Hamiltonian H M in Theorem 15.121 is the Legendre transform of the 
Lagrangian L. Note that for the "kinetic energy" Lagrangian K M , the above map 
becomes the optimal map exp M (— V/) mentioned at the beginning of this section, 
with exp M : TM — »• M being the Riemannian exponential of the manifold M. 



6. The subriemannian geometry of diffeomorphism groups 

In this section we develop the subriemannian setting for the diffeomorphism group. 
In particular, we derive the geodesic equations for the "nonholonomic Wasserstein 
metric," and describe nonholonomic versions of the Monge-Ampere and heat equa- 
tions. 

Let M be a manifold with a fixed distribution r on it. Recall that a subriemannian 
metric is a positive definite inner product ( , ) T on each plane of the distribution r 
smoothly depending on a point in M. Such a metric can be defined by the bundle 
map I : T*M — > r, sending a covector a x G T*M to the vector V x in the plane 
t x such that a x (U) = (V X ,U) T on vectors U G r x . The subriemannian Hamiltonian 
jjt . f*]\/f _> R i s the corresponding fiberwise quadratic form: 

(6.8) H^a x ) = l -(V x ,V x y. 

Let \1/^ T be the Hamiltonian flow for time t of the subriemannian Hamiltonian H T 
on T*M, while ir T : T*M — > M is the cotangent bundle projection. Then the 
subriemannian exponential map exp T : T*M — > M is defined as the projection to 
M of the time-one-map of the above Hamiltonian flow on T*M: 

(6.9) exp T (ta x ):=7r T * M ^ T (a x ). 

This relation defines a normal subriemannian geodesic on M with the initial covector 
a x . Note that the initial velocity of the subriemannian geodesic exp T (ta x ) is V x = 
Ta x G t x . So, unlike the Riemannian case, there are many subriemannian geodesies 
having the same initial velocity V x on M. 
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Let d T be a subriemannian (or, Carnot-Caratheodory) distance on the manifold 
M, denned as the infimum of the length of all absolutely continuous admissible (i.e. 
tangent to r) curves joining given two points. For a bracket-generating distribu- 
tion r any two points can be joined by such a curve, so this distance is always 
finite. Consider the corresponding optimal transport problem by replacing the Rie- 
mannian distance d in (14.11) with the subriemannian distance d T . Below we study the 
infinite-dimensional geometry of this subriemannian version of the optimal transport 
problem. Although in general normal subriemannian geodesies might not exhaust 
all the length minimizing geodesies in subriemannian manifolds (see [H]), we will 
see that in the problems of subriemannian optimal transport one can confine oneself 
to only such geodesies! 

6.1. Subriemannian submersion. Consider the following general setting: Let 
(Q, T) be a subriemannian space, i.e. a manifold Q with a distribution T and a 
subriemannian metric ( , ) T on it. Suppose that Q — > B is a bundle projection to a 
Riemannian base manifold B. 

Definition 6.1. The projection ir : (Q,T) — > B is a subriemannian submersion 
if the distribution T contains a horizontal subdistribution T hor , orthogonal (with 
respect to the subriemannian metric) to the intersections of T with fibers, and the 
projection ir maps the spaces T hor isometrically to the tangent spaces of the base 
B, see Figure ID 

Let a subriemannian submersion it : (Q, T) — > B be a principal G-bundle Q — > B, 
where the distribution T and the subriemannian metric are invariant with respect 
to the action of the group G. The following theorem is an analog of Corollary 14.41 

Theorem 6.2. For each point b in the base B and a point q in the fibre 7r _1 (6) C Q 
over b, every Riemannian geodesies on the base B starting at b admits a unique lift 
to the subriemannian geodesic on Q starting at q with the velocity vector in T hor . 

Example 6.3. Consider the standard Hopf bundle tt : S 3 — » S 2 , with the two- 
dimensional distribution T transversal to the fibers S 1 . Fix the standard metric 
on the base S 2 and lift it to a subriemannian metric on S* 3 , which defines a subrie- 
mannian submersion. If the distribution T is orthogonal to the fibers, the manifold 
(S 3 , T) can locally be thought of as the Heisenberg 3-dimensional group. Then 
all subriemannian geodesies on S 3 with a given horizontal velocity project to a 1- 
parameter family of circles on S 2 with a common tangent element. However, only 
one of these circles, the equator, is a geodesic on the standard sphere S 2 . Thus the 
equator can be uniquely lifted to a subriemannian geodesic on S 3 with the given 
initial vector. 

Note that the uniqueness of this lifting holds even if the distribution T is not 
orthogonal, but only transversal, say at a fixed angle, to the fibers S 1 , see Figure |5l 
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Figure 5. Projections of subriemannian geodesies from (S 3 ,T) in 
the Hopf bundle give circles in S 2 , only one of which, the equator, is 
a geodesic on the base S 2 . 



Proof of Theorem 16.21 To prove this theorem we describe the Hamiltonian sett 
of the subriemannian submersion. 
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Let Ver be the vertical subbundle in TQ (i.e. tangent planes to the fibers of the 
projection Q — > B). Define Ver 1 - C T*Q to be the corresponding annihilator, i.e. 
Ver^ is the set of all covectors a q G T*Q at the point q G Q which annihilate the 
vertical space Ver q . 

Definition 6.4. The restriction of the subriemannian exponential map exp r : T*Q — 
Q to the distribution Ver 1 - is called the horizontal exponential 

exp r : Ver 1 - — > Q 

and the corresponding geodesies are the horizontal subriemannian geodesies. 

The symplectic reduction identifies the quotient Ver- 1 / G with the cotangent bun- 
dle T*B of the base. Note that the subdistribution T hor defines a horizontal bundle 
for the principal bundle Q — > B in the usual sense. The definition of subriemannian 
submersion (translated to the cotangent spaces, where we replace T hor by Ver- 1 ) 
gives that the subriemannian Hamiltonian H r defined by (16. 8p descends to a Rie- 
mannian Hamiltonian H B,T on T*B. Moreover, Hamiltonian trajectories of H B,T 
starting at the cotangent space T£B are in one-to-one correspondence with the tra- 
jectories of H r starting at the space Ver 1 -. The projection of these Hamiltonian 
trajectories to the manifolds B and Q via the cotangent bundle projections tt t b 
and 7r T *^, respectively, gives the result. □ 



Corollary 6.5. For a subriemannian submersion, geodesies on the base give rise 
only to normal geodesies in the total space. 

In order to describe the geodesic geometry on the tangent, rather than cotangent, 
bundle of the manifold Q, we fix a Riemannian metric on Q whose restriction to 
the distribution r is the given subriemannian metric ( , ) r . This Riemannian metric 
allows one to identify the cotangent bundle T*Q with the tangent bundle TQ. Then 
the exponential map exp r can be viewed as a map TQ — > Q. It is convenient to think 
of T hor as the horizontal bundle and identify it with the annihilator Ver 1 -. This 
way horizontal subriemannian geodesies are geodesies with initial (co)vector in the 
horizontal bundle T hor . This identification is particularly convenient for the infinite- 
dimensional setting, where we work with the tangent bundle of the diffeomorphism 
group. 



6.2. A subriemannian analog of the Otto calculus. Fix a Riemannian metric 
( , ) on the manifold M. Let P T : TM — > r be the orthogonal projection of vectors 
on M onto the distribution r with respect to this metric. Let (u, r]i) and (^,772) be 
two tangent vectors in the tangent space at the point v of the smooth Wasserstein 
space. Recall that for a fixed the volume form /i, we define the subriemannian 
Laplacian as A T f := div At (P T V/). 
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Define a nonholonomic Wasserstein metric as the (weak) Riemannian metric on 
the (smooth) Wasserstein space W given by 

(6.10) ((v,Vi),('>,Th)) W ' T -= [ {P T Vh{x),P^f 2 {x)) M v, 

JM 

where functions f\ and / 2 are solutions of the subriemannian Poisson equation 

-(A T f i )v = r n 

for the measure v. 

Theorem 6.6. The geodesies on the Wasserstein space W equipped with the non- 
holonomic Wasserstein metric A6.10\) have the form (exp r (tP r V/))*z/ ; where exp r : 
q-hor _^ j^j- ^ s ^ e fi 0r i zon t a i exponential map and v is any point ofW. 

To prove this theorem we first note that the Riemannian metric ( , ) v defined on 
the diffeomorphism group restricts to a subriemannian metric ( , ) v,r on the right 
invariant bundle T. 

Proposition 6.7. The map n : (T>, T) — > W is a subriemannian submersion of the 
subriemannian metric ( , ) v ' on the diffeomorphism group with distribution T to 
the nonholonomic Wasserstein metric (,) w ' r . 

Proof. This statement can be derived from the Hamiltonian reduction, similarly to 
the Riemannian case. 

Here we prove it by an explicit computation. Recall that the map it : T> — > W 
is defined by 7r(0) = 0*/4. Let X o be a tangent vector at the point in the 
diffeomorphism group T>. Consider the flow 0t of the vector field X, and note that 
7r(0 t o 0) = <f> u (f)*[j,. To compute the derivative Dtt we differentiate this equation 
with respect to time t at t = 0: 

Dtt(X o 0) = £_x(0*/u) = -(div^ M X)0*/^ , 

by the definition of Lie derivative. A vector field X from the horizontal bundle T hor 
has the form (P T V/) o 0, and for it the equation becomes 

Dn((P T Vf)o^) = -(A r f) fop, 

where the Laplacian A T is taken with respect to the volume form 0*//. 

Therefore, for horizontal tangent vectors (P T V/i) o0 and (P T V/2) °4> at the point 
their subriemannian inner product is 

T> _ / / DTV7 f „ J. DTV7 X „ J. \ M , 



((p t va)o0,(p-v/ 2 )o0)^= / (p-v/ 1 o0,p-v/ 2 o0) M ^. 

JM 

After the change of variables this becomes 

/ (P T V/i,P r V/ 2 ) M 0,/i = (DndP^h) o0),P7t((P^V/ 2 ) °0)> w ' t , 
which completes the proof. □ 
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Proof of Theorem 16.61 To describe geodesies in the nonholonomic Wasserstein space 
we define the Hamiltonian H T : TT> — > R by 

(6.11) H T {Xo(j>):= [ {{P T X) o0,(P r X) o0)/i. 

The Hamiltonian flow with Hamiltonian H T , has the form exp r ((tP r X) o0) accord- 
ing to Theorem 15.81 By taking its restriction to the bundle T hor and projecting to 
the base we obtain that the geodesies on the smooth Wasserstein space are 

(^{(tP T Vf)o<j>))*v, 

where v = <p*fi and P T Vf is defined by the Hodge decomposition for the field X. 
This completes the proof of Theorem 16. 6[ □ 

Remark 6.8. For a horizontal subriemannian geodesic ift(x) '■= exp r (tP T V/(x)) 
with a smooth function /, the diffeomorphism <p t satisfies 4i<f t = (P T V ft) o <p t and 
f t is the solution of the Hamilton- Jacobi equation 

(6.12) f t + H T (Vft) = 

with the initial condition fo = f, see Corollary 15. 101 This equation determines hori- 
zontal subriemannian geodesies on the diffeomorphism group T>. In the Riemannian 
case, one can see that the vector fields V t = 4r(pt = V/t o tp t satisfy the Burgers 
equation by taking the gradient of the both sides in (16.121) . cf. Proposition 14. 1[ 
Hence Equation (16.121) can be viewed as a subriemannian analog of the potential 
Burgers equation in T>. However, a subriemannian analog of the Burgers equation 
for nonhorizontal (i.e. nonpotential) normal geodesies on the diffeomorphism group 
is not so explicit. 

Remark 6.9. If the function / is smooth, the time-one-map (p(x) := exp r (P T V/(x)) 
along the geodesies described in Theorem 16.61 satisfies the following nonholonomic 
analog of the Monge-Ampere equation: h((p(x)) det(D(p(x)) = g(x), where g and h 
are functions on the manifold M defining two densities 9 = gvol and v = hvol. 

Furthermore, for the case of the Heisenberg group this formal solution (p(x) co- 
incides with the optimal map obtained in [TJ. The (minus) potential — / of the 
corresponding optimal map satisfies the c-concavity condition for c = d%/2, where 
d%. is the subriemannian distance, cf. Remark 14.51 



6.3. The nonholonomic heat equation. Consider the heat equation d t u = Au 
on a function u on the manifold M, where the operator A is given by Af = div^V/. 
Upon multiplying the both sides of the heat equation by the fixed volume form /x, 
one can regard it as an evolution equation on the smooth Wasserstein space W. 
Note that the right-hand-side of the heat equation gives a tangent vector (Am)ju at 



A NONHOLONOMIC MOSER THEOREM AND OPTIMAL TRANSPORT 



29 



the point upL of the Wasserstein space. The Boltzmann relative entropy functional 
Ent : W — > R is defined by the integral 

(6.13) Ent(z/) := / log(v/fi) v . 

JM 

The gradient flow of Ent on the Wasserstein space with respect to the metric d gives 
the heat equation, see [18] . 

Recall that one can define the subriemannian Laplacian: A T f := div M (P r V/) for 
a fixed volume form fi on M. The natural generalization of the heat equation to the 
nonholonomic setting is as follows. 

Definition 6.10. The nonholonomic (or, subriemannian) heat equation is the equa- 
tion dtu = A T u on a time-dependent function u on M. 

Below we show that this equation in the nonholonomic setting also admits a 
gradient interpretation on the Wasserstein space. 

Theorem 6.11. The nonholonomic heat equation dtu = A T u describes the gradient 
flow on the Wasserstein space with respect to the relative entropy functional A6.13\) 
and the nonholonomic Wasserstein metric A6.10\) . 

Namely, for the volume form v t := g tif \i and the gradient V w ' r with respect to the 
metric ( , ) W ' T on the Wasserstein space one has 

^v t = -V w > T Ent(v t ) = A> t //i)/i. 

Proof. Denote by (u, rj) a tangent vector to the Wasserstein space W at a point 
v G W, where rj is a volume form of total integral zero. Let A£ be the subriemannian 
Laplacian with respect to the volume form v. 

Let h and h-^nt be real- valued functions on the manifold M such that —{A T v h)v = rj 
and — (A^/iEnt)^ = V w ' r Ent(z/) for the entropy functional Ent. Then, by definition 
of the metric ( , } w ' r given by (I6.10p . we have 

(6.14) ((^,V w ' T EntH),(^)) w ' T = / (P T Vh Ent (x),P T Vh(x)) M v. 

J M 

On the other hand, by definitions of Ent and the gradient V w ' r on the Wasserstein 
space, one has: 

((i/,V w ' T Ent(i/)),(i/^)> w ' T :=^ Entity) = ^ / [logf^)l(^). 

at t=o at t=o J M L V n J \ 

After differentiation and simplification the latter expression becomes j M Xogiy/p) t] , 
where we used that f M V = 0. This can be rewritten as 

/ ]og(u/fi) rj = - \og{v/n) C P r Vh u = / (C P r Vh \og{u/fi)) v, 

JM JM JM 
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by using the Leibnitz property of the Lie derivative £ on the Wasserstein space and 
the fact that — (&lh)v = rj. Note that the Lie derivative is the inner product with 
the gradient, and hence 

M,,_ I I dtv7 l — I ,.\ n>TT7U\M, 



(C P T Vh log(v/pL)) v = J (V\og(v/fi),P T Vh) M v = J (P T V\og(v/i2),P T Vh) lu is. 

M M M 

Comparing the latter form with (16. 14ft . we get P r V/iEnt = -P r V log(z///i), or, after 
taking the divergence of both parts and using the definition of function ZiEnt; 

V w ' r Ent(z/) = -A£(log(i//^))i/. 

Finally, let us show that the right-hand-side of the above equation coincides with 
— A^u/fi) fi. Indeed, the chain rule gives 

The last term is equal to {ip-r^( u /^)d{lJi/i'))u = CpT^^ u /^(fi/u) u, which implies that 

£p T V\og(v/Li)V = ^P T V(i///i)/* 

by the Leibnitz property of Lie derivative. Thus 

A ^( lo gO/A0) v = div^(P T V(log(z///i))i/ = C P T V i ^ v/il) v = Cpr V{u/tl) n = Afa / fi) fi . 

The above shows that the nonholonomic heat equation is the gradient flow on the 
Wasserstein space for the same potential as the classical heat equation, but with 
respect to the nonholonomic Wasserstein metric. □ 
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